Maryland Total Migration Analysis (FY2020-FY2024)¶
Executive Summary¶
This notebook provides exploratory data analysis of Maryland's net migration patterns across fiscal years 2020-2024. The analysis examines migration trends across all Maryland jurisdictions, identifying areas of population gain and loss.
Data Source: Maryland State Database
Database: Maryland
Table: Migration
Fiscal Year Period: July 1 - June 30
Data Type: Net migration (positive = in-migration, negative = out-migration)
1. Setup and Data Loading¶
In [13]:
# Import required libraries
import os
from dotenv import load_dotenv
import mssql_python as mssql
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
import warnings
warnings.filterwarnings("ignore", message="pandas only supports SQLAlchemy connectable")
# Load environment variables
load_dotenv()
# Configure visualization
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
# Configure Plotly for HTML export - include full plotly.js library
pio.renderers.default = "notebook"
import plotly.offline as pyo
pyo.init_notebook_mode(connected=False)
# Database connection
SQL_CONNECTION_STRING = os.getenv("SQL_CONNECTION_STRING")
TABLE_NAME = "[Maryland].[dbo].[Migration]"
print("✓ Libraries imported successfully")
print("✓ Environment configured")
✓ Libraries imported successfully ✓ Environment configured
In [14]:
# Load data from SQL Server
connection = mssql.connect(SQL_CONNECTION_STRING)
cursor = connection.cursor()
query = f"SELECT * FROM {TABLE_NAME} ORDER BY FiscalYear"
df = pd.read_sql(query, connection)
connection.close()
print(f"Data loaded: {len(df)} records")
print(f"Fiscal years: FY{df['FiscalYear'].min()} - FY{df['FiscalYear'].max()}")
df.head()
Data loaded: 5 records Fiscal years: FY2020 - FY2024
Out[14]:
| FiscalYear | PeriodStart | PeriodEnd | Maryland | Allegany | AnneArundel | BaltimoreCity | Baltimore | Calvert | Caroline | Carroll | Cecil | Charles | Dorchester | Frederick | Garrett | Harford | Howard | Kent | Montgomery | PrinceGeorgeS | QueenAnneS | Somerset | StMaryS | Talbot | Washington | Wicomico | Worcester | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020 | 2020-04-01 | 2020-06-30 | -4982 | -58 | 141 | -2161 | -928 | 106 | -22 | 79 | 81 | 19 | 10 | 913 | -18 | 223 | 149 | -13 | -1732 | -2058 | 127 | -28 | 133 | 36 | -27 | -89 | 135 |
| 1 | 2021 | 2020-07-01 | 2021-06-30 | -8725 | 209 | 784 | -6001 | -2605 | 1200 | 141 | 1362 | 350 | 1662 | 151 | 7118 | 131 | 2008 | 1050 | 311 | -8998 | -11763 | 879 | 92 | 229 | 501 | 693 | 507 | 1264 |
| 2 | 2022 | 2021-07-01 | 2022-06-30 | 1854 | 103 | 765 | -5830 | -2021 | 332 | 89 | 1527 | 1062 | 1280 | 177 | 6739 | 11 | 752 | -145 | 250 | -341 | -6755 | 861 | 218 | -57 | 382 | 858 | 741 | 856 |
| 3 | 2023 | 2022-07-01 | 2023-06-30 | 11010 | 500 | -833 | -3681 | -842 | 58 | 198 | 1246 | 838 | 1646 | 418 | 4920 | 20 | 761 | -268 | 176 | 3158 | -776 | 850 | 406 | 74 | 319 | 1012 | 294 | 516 |
| 4 | 2024 | 2023-07-01 | 2024-06-30 | 34591 | 252 | 416 | 237 | 3376 | -34 | 293 | 404 | 683 | 2080 | 282 | 4324 | 77 | 610 | 1267 | 188 | 8401 | 7059 | 1003 | 264 | 630 | 418 | 1438 | 481 | 442 |
2. Data Structure and Quality Assessment¶
In [15]:
# Data structure overview
print("=" * 80)
print("DATA STRUCTURE")
print("=" * 80)
print(f"\nDataset Shape: {df.shape[0]} rows × {df.shape[1]} columns")
print(f"\nColumn Overview:")
print(df.dtypes)
print("\n" + "=" * 80)
print("DATA QUALITY ASSESSMENT")
print("=" * 80)
# Check for missing values
missing_counts = df.isnull().sum()
if missing_counts.sum() == 0:
print("\n✓ No missing values detected")
else:
print("\nMissing values by column:")
print(missing_counts[missing_counts > 0])
# Check for duplicates
duplicates = df.duplicated(subset=['FiscalYear']).sum()
print(f"\nDuplicate fiscal years: {duplicates}")
# Identify county columns (excluding FiscalYear and Period columns)
county_columns = [col for col in df.columns if col not in ['FiscalYear', 'PeriodStart', 'PeriodEnd']]
print(f"\n✓ {len(county_columns)} jurisdictions in dataset")
================================================================================ DATA STRUCTURE ================================================================================ Dataset Shape: 5 rows × 28 columns Column Overview: FiscalYear int64 PeriodStart object PeriodEnd object Maryland int64 Allegany int64 AnneArundel int64 BaltimoreCity int64 Baltimore int64 Calvert int64 Caroline int64 Carroll int64 Cecil int64 Charles int64 Dorchester int64 Frederick int64 Garrett int64 Harford int64 Howard int64 Kent int64 Montgomery int64 PrinceGeorgeS int64 QueenAnneS int64 Somerset int64 StMaryS int64 Talbot int64 Washington int64 Wicomico int64 Worcester int64 dtype: object ================================================================================ DATA QUALITY ASSESSMENT ================================================================================ ✓ No missing values detected Duplicate fiscal years: 0 ✓ 25 jurisdictions in dataset
3. Statewide Migration Summary¶
In [16]:
# Statewide migration statistics
print("=" * 80)
print("MARYLAND STATEWIDE MIGRATION (FY2020-FY2024)")
print("=" * 80)
summary = df[['FiscalYear', 'Maryland']].copy()
summary.columns = ['Fiscal Year', 'Net Migration']
summary['Migration Type'] = summary['Net Migration'].apply(lambda x: 'In-Migration' if x > 0 else 'Out-Migration')
summary['Absolute Migration'] = summary['Net Migration'].abs()
print("\n", summary.to_string(index=False))
print("\n" + "=" * 80)
print("SUMMARY STATISTICS")
print("=" * 80)
print(f"Total 5-Year Net Migration: {summary['Net Migration'].sum():,}")
print(f"Average Annual Net Migration: {summary['Net Migration'].mean():,.0f}")
print(f"Years with Net In-Migration: {(summary['Net Migration'] > 0).sum()}")
print(f"Years with Net Out-Migration: {(summary['Net Migration'] < 0).sum()}")
print(f"Highest In-Migration: FY{summary.loc[summary['Net Migration'].idxmax(), 'Fiscal Year']} ({summary['Net Migration'].max():,})")
print(f"Highest Out-Migration: FY{summary.loc[summary['Net Migration'].idxmin(), 'Fiscal Year']} ({summary['Net Migration'].min():,})")
================================================================================
MARYLAND STATEWIDE MIGRATION (FY2020-FY2024)
================================================================================
Fiscal Year Net Migration Migration Type Absolute Migration
2020 -4982 Out-Migration 4982
2021 -8725 Out-Migration 8725
2022 1854 In-Migration 1854
2023 11010 In-Migration 11010
2024 34591 In-Migration 34591
================================================================================
SUMMARY STATISTICS
================================================================================
Total 5-Year Net Migration: 33,748
Average Annual Net Migration: 6,750
Years with Net In-Migration: 3
Years with Net Out-Migration: 2
Highest In-Migration: FY2024 (34,591)
Highest Out-Migration: FY2021 (-8,725)
In [17]:
# Visualize statewide migration trend
fig = go.Figure()
fig.add_trace(go.Bar(
x=df['FiscalYear'],
y=df['Maryland'],
marker=dict(
color=df['Maryland'],
colorscale=[[0, '#d62728'], [0.5, '#cccccc'], [1, '#2ca02c']],
line=dict(color='black', width=1)
),
text=df['Maryland'],
textposition='outside',
texttemplate='%{text:,}',
name='Net Migration'
))
fig.update_layout(
title='Maryland Statewide Net Migration by Fiscal Year',
xaxis_title='Fiscal Year',
yaxis_title='Net Migration',
template='plotly_white',
height=500,
hovermode='x unified'
)
fig.add_hline(y=0, line_dash="dash", line_color="black", opacity=0.5)
fig.show()
4. County-Level Migration Analysis¶
In [18]:
# Reshape data for county-level analysis (exclude Maryland state-level total)
county_cols = [col for col in df.columns if col not in ['FiscalYear', 'PeriodStart', 'PeriodEnd', 'Maryland']]
county_data = df.melt(
id_vars=['FiscalYear', 'PeriodStart', 'PeriodEnd'],
value_vars=county_cols,
var_name='County',
value_name='NetMigration'
)
# Calculate total migration by county over all years
total_by_county = county_data.groupby('County')['NetMigration'].sum().sort_values(ascending=False)
print("=" * 80)
print("TOTAL NET MIGRATION BY COUNTY (FY2020-FY2024)")
print("=" * 80)
print("\nTop 10 Counties (Net In-Migration):")
print(total_by_county.head(10).to_string())
print("\n\nBottom 10 Counties (Net Out-Migration):")
print(total_by_county.tail(10).to_string())
print(f"\n\nCounties with net in-migration: {(total_by_county > 0).sum()} of {len(total_by_county)}")
print(f"Counties with net out-migration: {(total_by_county < 0).sum()} of {len(total_by_county)}")
print(f"\n✓ State Total (Maryland): {df['Maryland'].sum():,} net in-migration")
================================================================================ TOTAL NET MIGRATION BY COUNTY (FY2020-FY2024) ================================================================================ Top 10 Counties (Net In-Migration): County Frederick 24014 Charles 6687 Carroll 4618 Harford 4354 Washington 3974 QueenAnneS 3720 Worcester 3213 Cecil 3014 Howard 2053 Wicomico 1934 Bottom 10 Counties (Net Out-Migration): County StMaryS 1009 Allegany 1006 Somerset 952 Kent 912 Caroline 699 Montgomery 488 Garrett 221 Baltimore -3020 PrinceGeorgeS -14293 BaltimoreCity -17436 Counties with net in-migration: 21 of 24 Counties with net out-migration: 3 of 24 ✓ State Total (Maryland): 33,748 net in-migration
In [19]:
# Visualize top gainers and losers among counties
top_n = 10
top_gainers = total_by_county.head(top_n)
top_losers = total_by_county.tail(top_n).sort_values()
fig = make_subplots(
rows=1, cols=2,
subplot_titles=('Top 10 Jurisdictions: Net In-Migration', 'Top 10 Jurisdictions: Net Out-Migration'),
horizontal_spacing=0.15
)
# Top gainers
fig.add_trace(
go.Bar(
y=top_gainers.index,
x=top_gainers.values,
orientation='h',
marker=dict(color='#2ca02c'),
text=top_gainers.values,
texttemplate='%{text:,}',
textposition='outside',
name='In-Migration'
),
row=1, col=1
)
# Top losers
fig.add_trace(
go.Bar(
y=top_losers.index,
x=top_losers.values,
orientation='h',
marker=dict(color='#d62728'),
text=top_losers.values,
texttemplate='%{text:,}',
textposition='outside',
name='Out-Migration'
),
row=1, col=2
)
fig.update_layout(
title_text='5-Year Total Net Migration by County (FY2020-FY2024)',
showlegend=False,
height=600,
template='plotly_white'
)
fig.update_xaxes(title_text='Net Migration', row=1, col=1)
fig.update_xaxes(title_text='Net Migration', row=1, col=2)
fig.show()
5. Temporal Trends by County¶
In [20]:
# Select top 8 counties by total absolute migration for trend analysis
top_counties = total_by_county.abs().nlargest(8).index.tolist()
# Filter county data for selected counties
trend_data = county_data[county_data['County'].isin(top_counties)]
# Create line chart
fig = go.Figure()
for county in top_counties:
county_trend = trend_data[trend_data['County'] == county]
fig.add_trace(go.Scatter(
x=county_trend['FiscalYear'],
y=county_trend['NetMigration'],
mode='lines+markers',
name=county,
line=dict(width=2),
marker=dict(size=8)
))
fig.update_layout(
title='Migration Trends: Top 8 Counties by Migration Volume',
xaxis_title='Fiscal Year',
yaxis_title='Net Migration',
template='plotly_white',
height=600,
hovermode='x unified',
legend=dict(
orientation="v",
yanchor="top",
y=1,
xanchor="left",
x=1.02
)
)
fig.add_hline(y=0, line_dash="dash", line_color="black", opacity=0.3)
fig.show()
6. Year-over-Year Change Analysis¶
In [21]:
# Calculate year-over-year changes for Maryland statewide
df_sorted = df.sort_values('FiscalYear')
df_sorted['YoY_Change'] = df_sorted['Maryland'].diff()
df_sorted['YoY_Change_Pct'] = df_sorted['Maryland'].pct_change() * 100
print("=" * 80)
print("MARYLAND STATEWIDE: YEAR-OVER-YEAR MIGRATION CHANGES")
print("=" * 80)
print("\n", df_sorted[['FiscalYear', 'Maryland', 'YoY_Change', 'YoY_Change_Pct']].to_string(index=False))
print("\n" + "=" * 80)
print("KEY OBSERVATIONS")
print("=" * 80)
print(f"Largest YoY increase: FY{df_sorted.loc[df_sorted['YoY_Change'].idxmax(), 'FiscalYear']} "
f"(+{df_sorted['YoY_Change'].max():,.0f})")
print(f"Largest YoY decrease: FY{df_sorted.loc[df_sorted['YoY_Change'].idxmin(), 'FiscalYear']} "
f"({df_sorted['YoY_Change'].min():,.0f})")
================================================================================
MARYLAND STATEWIDE: YEAR-OVER-YEAR MIGRATION CHANGES
================================================================================
FiscalYear Maryland YoY_Change YoY_Change_Pct
2020 -4982 NaN NaN
2021 -8725 -3743.0 75.130470
2022 1854 10579.0 -121.249284
2023 11010 9156.0 493.851133
2024 34591 23581.0 214.178020
================================================================================
KEY OBSERVATIONS
================================================================================
Largest YoY increase: FY2024 (+23,581)
Largest YoY decrease: FY2021 (-3,743)
7. County Migration Volatility Analysis¶
In [22]:
# Calculate standard deviation (volatility) for each county
volatility = county_data.groupby('County')['NetMigration'].agg([
('Mean', 'mean'),
('StdDev', 'std'),
('Min', 'min'),
('Max', 'max'),
('Range', lambda x: x.max() - x.min())
]).sort_values('StdDev', ascending=False)
print("=" * 80)
print("MIGRATION VOLATILITY BY COUNTY")
print("=" * 80)
print("\nTop 10 Most Volatile Counties (by Standard Deviation):")
print(volatility.head(10).to_string())
print("\n\nTop 10 Most Stable Counties (lowest volatility):")
print(volatility.tail(10).to_string())
================================================================================
MIGRATION VOLATILITY BY COUNTY
================================================================================
Top 10 Most Volatile Counties (by Standard Deviation):
Mean StdDev Min Max Range
County
PrinceGeorgeS -2858.6 7032.198184 -11763 7059 18822
Montgomery 97.6 6414.480595 -8998 8401 17399
BaltimoreCity -3487.2 2620.779121 -6001 237 6238
Frederick 4802.8 2473.992259 913 7118 6205
Baltimore -604.0 2345.949168 -2605 3376 5981
Charles 1337.4 789.545312 19 2080 2061
Howard 410.6 703.540546 -268 1267 1535
Harford 870.8 672.133692 223 2008 1785
AnneArundel 254.6 663.814959 -833 784 1617
Carroll 923.6 641.007254 79 1527 1448
Top 10 Most Stable Counties (lowest volatility):
Mean StdDev Min Max Range
County
QueenAnneS 744.0 350.328417 127 1003 876
Wicomico 386.8 309.748608 -89 741 830
StMaryS 201.8 260.834622 -57 630 687
Allegany 201.2 205.396446 -58 500 558
Talbot 331.2 177.650500 36 501 465
Somerset 190.4 165.863800 -28 406 434
Dorchester 207.6 152.470653 10 418 408
Kent 182.4 121.820770 -13 311 324
Caroline 139.8 117.918192 -22 293 315
Garrett 44.2 59.512184 -18 131 149
8. Heatmap: Migration Patterns Across Time and Geography¶
In [23]:
# Create pivot table for heatmap (counties only)
heatmap_data = county_data.pivot(index='County', columns='FiscalYear', values='NetMigration')
heatmap_data = heatmap_data.sort_values(by=2024, ascending=False)
# Create heatmap
fig = go.Figure(data=go.Heatmap(
z=heatmap_data.values,
x=heatmap_data.columns,
y=heatmap_data.index,
colorscale='RdYlGn',
zmid=0,
text=heatmap_data.values,
texttemplate='%{text:,}',
textfont={"size": 9},
colorbar=dict(title="Net Migration")
))
fig.update_layout(
title='Migration Heatmap: All Counties by Fiscal Year',
xaxis_title='Fiscal Year',
yaxis_title='County',
height=900,
template='plotly_white'
)
fig.show()
9. Key Findings and Summary¶
In [24]:
# Generate comprehensive summary
print("=" * 80)
print("MARYLAND MIGRATION ANALYSIS: KEY FINDINGS (FY2020-FY2024)")
print("=" * 80)
# Statewide findings
total_migration = df['Maryland'].sum()
avg_migration = df['Maryland'].mean()
print(f"\n📊 MARYLAND STATEWIDE MIGRATION (State-Level Total)")
print(f" • Total 5-Year Net Migration: {total_migration:,}")
print(f" • Average Annual Migration: {avg_migration:,.0f}")
print(f" • Trend: Shifted from out-migration (FY2020-2021) to strong in-migration (FY2022-2024)")
print(f" • Peak Year: FY2024 with {df['Maryland'].max():,} net in-migrants")
# Top gainers among counties
print(f"\n🔼 TOP COUNTIES (Net In-Migration)")
top_3 = total_by_county.head(3)
for i, (county, value) in enumerate(top_3.items(), 1):
print(f" {i}. {county}: {value:,}")
# Top losers among counties
print(f"\n🔽 TOP COUNTIES (Net Out-Migration)")
bottom_3 = total_by_county.tail(3).sort_values()
for i, (county, value) in enumerate(bottom_3.items(), 1):
print(f" {i}. {county}: {value:,}")
# Volatility insights among counties
most_volatile = volatility.index[0]
most_stable = volatility.index[-1]
print(f"\n📈 COUNTY VOLATILITY INSIGHTS")
print(f" • Most Volatile County: {most_volatile} (StdDev: {volatility.loc[most_volatile, 'StdDev']:.0f})")
print(f" • Most Stable County: {most_stable} (StdDev: {volatility.loc[most_stable, 'StdDev']:.0f})")
# Distribution summary among counties
gainers = (total_by_county > 0).sum()
losers = (total_by_county < 0).sum()
total_counties = len(total_by_county)
print(f"\n🎯 COUNTY DISTRIBUTION")
print(f" • Counties with net in-migration: {gainers} of {total_counties} ({gainers/total_counties*100:.1f}%)")
print(f" • Counties with net out-migration: {losers} of {total_counties} ({losers/total_counties*100:.1f}%)")
print("\n" + "=" * 80)
print("CONCLUSION")
print("=" * 80)
print(f"""
Maryland experienced a significant migration reversal during FY2020-2024:
STATE-LEVEL PATTERN:
• Early Period (FY2020-2021): Net out-migration totaling -13,707
• Recovery Period (FY2022-2024): Strong rebound with +47,455 net in-migration
• Overall 5-Year Trend: Net gain of {total_migration:,} residents
COUNTY-LEVEL PATTERNS:
• {gainers} of {total_counties} counties experienced net in-migration
• Frederick County led all counties with +{total_by_county.iloc[0]:,} net in-migration
• Baltimore City had largest out-migration at {total_by_county.iloc[-1]:,}
• Most counties showed recovery patterns similar to state-level trends
NOTABLE INSIGHTS:
• FY2024 showed strongest in-migration year ({df['Maryland'].max():,})
• COVID-19 impact visible in FY2020-2021 out-migration
• Suburban/exurban counties (Frederick, Charles, Carroll) showed strongest gains
• Urban core jurisdictions (Baltimore City, Prince George's) experienced losses
""")
================================================================================ MARYLAND MIGRATION ANALYSIS: KEY FINDINGS (FY2020-FY2024) ================================================================================ 📊 MARYLAND STATEWIDE MIGRATION (State-Level Total) • Total 5-Year Net Migration: 33,748 • Average Annual Migration: 6,750 • Trend: Shifted from out-migration (FY2020-2021) to strong in-migration (FY2022-2024) • Peak Year: FY2024 with 34,591 net in-migrants 🔼 TOP COUNTIES (Net In-Migration) 1. Frederick: 24,014 2. Charles: 6,687 3. Carroll: 4,618 🔽 TOP COUNTIES (Net Out-Migration) 1. BaltimoreCity: -17,436 2. PrinceGeorgeS: -14,293 3. Baltimore: -3,020 📈 COUNTY VOLATILITY INSIGHTS • Most Volatile County: PrinceGeorgeS (StdDev: 7032) • Most Stable County: Garrett (StdDev: 60) 🎯 COUNTY DISTRIBUTION • Counties with net in-migration: 21 of 24 (87.5%) • Counties with net out-migration: 3 of 24 (12.5%) ================================================================================ CONCLUSION ================================================================================ Maryland experienced a significant migration reversal during FY2020-2024: STATE-LEVEL PATTERN: • Early Period (FY2020-2021): Net out-migration totaling -13,707 • Recovery Period (FY2022-2024): Strong rebound with +47,455 net in-migration • Overall 5-Year Trend: Net gain of 33,748 residents COUNTY-LEVEL PATTERNS: • 21 of 24 counties experienced net in-migration • Frederick County led all counties with +24,014 net in-migration • Baltimore City had largest out-migration at -17,436 • Most counties showed recovery patterns similar to state-level trends NOTABLE INSIGHTS: • FY2024 showed strongest in-migration year (34,591) • COVID-19 impact visible in FY2020-2021 out-migration • Suburban/exurban counties (Frederick, Charles, Carroll) showed strongest gains • Urban core jurisdictions (Baltimore City, Prince George's) experienced losses